Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CDAP-15361] Add Schema handling in Wrangler #645

Merged
merged 6 commits into from
Jul 25, 2023
Merged

[CDAP-15361] Add Schema handling in Wrangler #645

merged 6 commits into from
Jul 25, 2023

Conversation

vanathi-g
Copy link
Contributor

@vanathi-g vanathi-g commented Jun 19, 2023

Background

Currently, Wrangler infers the CDAP schema of the data where necessary during design-time (UI display, creating plugin/applying directives). This leads to inference errors and information loss. Hence, a mechanism for directives to correctly update schema after transformation is implemented and input schema information is not lost.

Changes

  • Added getOutputSchema method to Executor interface (Directives can use this to return custom schema after transform)
  • Input schema and output schema after applying a recipe are stored in TransientStore and accessed when necessary.
  • WorkspaceHandler's specification and AbstractDirectiveHandler's generateExecutionResponse method now use the output schema instead of inferring
  • For backwards compatibility, if a directive does not override getOutputSchema, the schema is generated as a fallback mechanism

NOTE: the schema changes are only applicable in design time (when Wrangler runs as a service, not in the plugin)

@vanathi-g vanathi-g added the build Triggers unit test build label Jun 19, 2023
@vanathi-g vanathi-g marked this pull request as ready for review June 21, 2023 12:40
Copy link
Contributor

@tivv tivv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to JIRA int the PR title

@vanathi-g vanathi-g changed the title Add Schema handling in Wrangler [CDAP-15361] Add Schema handling in Wrangler Jun 23, 2023
@vanathi-g vanathi-g requested review from tivv and iam-divya June 23, 2023 11:21
@vanathi-g vanathi-g requested a review from tivv July 11, 2023 07:15
if (designTime && schema != null) {
Schema directiveOutputSchema = directive.getOutputSchema(schema);
schema = directiveOutputSchema != null ? directiveOutputSchema
: generateOutputSchema(schema, cumulativeRows);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should introduce a feature flag for this with default enabled.. If schema generator fails on some customer schema, there should be a way to disable it

@vanathi-g vanathi-g requested a review from tivv July 21, 2023 11:26
@vanathi-g vanathi-g requested a review from tivv July 24, 2023 16:12
Map<String, String> types = new LinkedHashMap<>();

Schema outputSchema = TRANSIENT_STORE.get(TransientStoreKeys.OUTPUT_SCHEMA) != null ?
TRANSIENT_STORE.get(TransientStoreKeys.OUTPUT_SCHEMA) : TRANSIENT_STORE.get(TransientStoreKeys.INPUT_SCHEMA);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases OUTPUT_SCHEMA will be null? This "else" part seems problematic for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially only input schema is set in transient store (when workspace created for ex)

@vanathi-g vanathi-g merged commit 6e5814c into develop Jul 25, 2023
@vanathi-g vanathi-g deleted the add-schema branch July 25, 2023 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Triggers unit test build
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants